Ch 12 -- Understanding Relational Databases

Charlie Calvert's C++ Builder Unleashed

- 12 -

Understanding Relational Databases

In order to make sure everyone is following the discussion in the next few chapters, I'm going to spend a few pages giving a quick-and-dirty introduction to relational databases. This discussion will also include a brief overview of the Database Desktop.

My purpose here is to give a relatively concise explanation of what it means to use a relational, as opposed to a flat-file, database. Naturally, this will be a very broad overview of a complex and highly detailed subject. I am not attempting an academic analysis of this field of study, but instead want to provide a practical guide for everyday use.

In this chapter, I will be working with Paradox tables and InterBase tables. Each database has its own unique set of rules. There is no definitive example of a relational database, any more than there is a definitive operating system or a definitive compiler. All databases have things in common, just as all compilers and all operating systems have things in common. As much as possible, I try to stress these common traits throughout this chapter. However, the specific implementation that I am referencing here is for Paradox and InterBase databases, and not everything I say will apply to Oracle or dBASE tables.

In particular, this chapter is about the following:

Indices

Primary keys

Foreign keys

Referential integrity

If you already understand these subjects, you probably won't have much use for this chapter. If you need to review these subjects, or need to be introduced to them, you should read this chapter.
Getting Started with Relational Databases

There are many different kinds of possible databases, but in today's world, there are only two kinds that have any significant market share for the PC:

1. Flat-file databases

2. Relational databases

NOTE: Emerging in recent years has a been a new system called object-oriented databases. These databases represent an interesting form of technology, but I will omit discussion of them here because they have a small user base at this time.

The subject of object-oriented databases will come up again briefly in the chapters on OOP called "Inheritance," "Encapsulation," and "Polymorphism." In those chapters you will see that OOP has some powerful features that it can bring to the database world.

Flat-file databases consist of a single file. The classic example would be an address book that contains a single table with six fields in it: Name, Address, City, State, Zip, and Phone. If that is your entire database, what you have is a flat-file database. In a flat-file database, the words table and database are synonymous.

In general, relational databases consist of a series of tables related to each other by one or more fields in each table. In Chapter 9, "Using TTable and TDataSet," and Chapter 10, "SQL and the TQuery Object," you saw how to use the TTable and TQuery objects to relate the Customer and Orders tables together in a one-to-many relationship. As you recall, the two tables were joined on the CustNo field. The relationship established between these two tables on the CustNo field is very much at the heart of all relational databases.

The Address program shown in Chapter 13, "Flat-File, Real-World Databases," is an example of a flat-file database. In Chapter 14, "Sessions and Relational Real-World Databases," you will see a second program, called KDAdd, which is a relational database.

Here are three key differences between relational and flat-file databases:

1. A flat-file database, like the address book example outlined previously, consists of one single table. That's the whole database. There is nothing more to say about it. Each table stands alone, isolated in its own little solipsistic world.

2. Relational databases always contain multiple tables. For instance, the Customer and Orders tables are both part of the BCDEMOS database. As you will see, there are many other tables in this database, but for now just concentrate on the Customer and Orders tables.

3. Tables in relational databases are tied together in special fields. These fields are called primary and foreign keys. They are usually indexes, and they usually consist of a simple integer value. For instance, the Customer and Orders tables are related to one another by the CustNo field. The CustNo field is a primary key in the Customer table, and a foreign key in the Orders table. There are also indexes on both fields.

NOTE: Indices are about searching and sorting. Keys, on the other hand, are about relating tables, and particularly about something called referential integrity.

In practice, these concepts get mixed together in some pretty ugly ways, but the underlying theory relies on the kind of distinctions I am drawing in this note. For instance, keys are usually indexed, and so people often talk about keys and indexes as if they were the same thing. However, they are distinct concepts.

One way to start to draw the distinction is to understand that keys are part of the theory of relational databases, while indexes are part of the implementation of relational databases. More on this as the chapter evolves.

Clearly relational databases are radically different from flat-file databases. Relational databases typically consist of multiple tables, at least some of which are related together by one or more fields. Flat-file databases, on the other hand, consist of only one single table, which is not related to any other table.
Advantages of the Relational Database Model

What advantages do relational databases have over flat-file databases? Well, there are many strengths to this system; here are a few of the highlights:

Relational databases enforce something called referential integrity. These constraints help you enter data in a logical, systematic, and error-free manner.

Relational databases save disk space. For instance, the Customer table holds information about customers, including their address, phone, and contact information. The Orders table holds information about orders, including their date, cost, and payment method. If you were forced to keep all this information in a single table, each order would also have to list the customer information, which would mean that some customers' addresses would be repeated dozens of times in the database. In a big database, that kind of duplication can easily burn up megabytes of disk space. It's better to use a relational database because each customer's address would be entered only once. You could also have two flat-file databases, one holding the customer information and the other holding the orders information. The problem with this second scenario is that flat-file databases provide no means of relating the two tables so that you can easily see which orders belong to which customer.

Relational databases enable you to create one-to-many relationships. For instance, you can have one name that is related to multiple addresses. There is no simple way to capture that kind of relationship in a flat-file database. In the KDAdd program, you will see that it is possible to easily relate multiple addresses, phone numbers, and so on with each name. The flexible structure of relational databases enables programmers to adopt to these kinds of real-world situations. For many entries in a database, you will want to keep track of two addresses: one for a person's home, and the other for his or her work. If someone you know has a summer home or an apartment in the city, you need to add yet more addresses to the listing. There is no convenient way to do that in flat-file databases. Relational databases handle this kind of problem with ease. In the last paragraph I emphasized that this kind of feature saves space; in this paragraph I'm emphasizing that it allows for a more logical, flexible, and easy-to-use arrangement of your data.

To summarize, a relational database offers these possibilities:

You can view the Customer table alone, or you can view the Orders table alone.

You can place the two tables in a one-to-many relationship, so that you can see them side-by-side, but only see the orders relating to the currently highlighted customer.

You can perform a join between the two tables, so that you see them as one combined table, much like the combined table you would be forced to use if you wanted to "join" the Customer and Orders tables in a single flat-file database. However, you can decide which fields from both tables will be part of the join, leaving out any you don't want to view. The joined table is also temporary, and does not take up unnecessary disk space. In short, relational databases can use joins to provide some of the benefits of flat-file databases, whereas flat-file databases cannot emulate the virtues of relational databases.

As you can see, the three concepts that stand out when talking about relational databases are referential integrity, flexibility, and conservation of disk space. In this case, the word "flexibility" covers a wide range of broad features that can only be fully appreciated over time.

The one disadvantage that relational databases have when compared to flat-file databases is that they are more complicated to use. This is not just a minor sticking point. Neophytes are often completely baffled by relational databases. They don't have a clue as to what to do with them. Even if you have a relative degree of expertise, anyone can still become overwhelmed by a relational database that consists of three dozen tables related to one another in some hundred different ways. (And yes, complexity on that scale is not uncommon in corporate America!) As you will see later in the book, almost the only way to work with big systems of that type is through case tools.
Simple Set Logic: The Basis of Relational Databases

The basis for relational databases is a very simple form of mathematics. Each table represents a simple set that can be related to other tables through very fundamental mathematics. Because computers are so good at math, and particularly at integer math, they find relational databases easy to manipulate.

One common feature of relational databases is that most records will have a unique number associated with them, and these numbers will be used as the keys that relate one table to another. This enables you to group tables together using simple mathematical relationships. In particular, you can group them using simple integer-based set arithmetic.

For instance, in the Customers table from BCDEMOS, there is a unique CustNo field in each record. Furthermore, the Orders table has a unique OrderNo field associated with it. The Orders table also has a CustNo field that will relate it to the Customer table. The terminology of relational databases expresses these ideas by saying that the Customer table has a primary key called CustNo, and the Orders table has a primary key called OrderNo and a foreign key called CustNo:

Tablename Primary key Foreign key (secondary index)

Customer CustNo

Orders OrderNo CustNo

Given this scenario, you can say "Show me the set of all orders such that their CustNo field is equal to X or within the range of X - Y." Computers love these kinds of simple mathematical relationships. It's their bread and butter. In essence, you are just asking for the intersection of two sets: "Show me the intersection of this record from the Customer table with all the records from the Orders table." This intersection will consist of one record from the Customer table with a particular CustNo plus all the records from the Orders table that have the same CustNo in their foreign key.

These CustNo, OrderNo, AuthorNo, BookNo, and similar fields might also be used in flat-file databases as indexes, but they play a unique role in relational databases because they are the keys used to relate different tables. They make it possible to reduce the relationship between tables to nothing more than a simple series of mathematical formulas. These formulas are based on keys rather than on indexes. It is merely a coincidence that most keys also happen to be indexed.
Viewing Indices and Keys in DBD or the Explorer

In the next few sections I define primary and secondary keys, and describe how to use them. It might be helpful if I preface this discussion with a brief description of how to view keys using some of the tools that ship with BCB. This is just a preliminary look at this material. I cover it again in greater depth later in this chapter in a section called "Exploring the Indices in the BCDEMOS Database."

NOTE: Right now it is not so important that you understand what primary and foreign keys do, but only that you know how to view them using the tools that ship with the product. The theory will become clear as the chapter progresses.

There are two ways to view the indexes and keys on a table. The best way is in the Database Explorer. Open up the Explorer and view the BCDEMOS database as shown in Figure 12.1.

Click the Orders table and open up the Referential Constraints branch as shown in Figure 12.2. Notice that there are two constraints on this table, one called RefCustInOrders and the second called RefOrders. The RefCustInOrders field defined CustNo as a foreign key that relates to the CustNo field in the Customer table.

A second way to view this key is in the Database Desktop. Set the Working Directory from the File menu to BCDEMOS. Open up the Orders table in the Database Desktop and select Table | Info structure from the menu. Drop down the Table Properties and select Referential Integrity, as shown in Figure 12.3.

FIGURE 12.1. Viewing the BCDEMOS database in the Database Explorer.

FIGURE 12.2. The primary and foreign fields of the Orders table.

FIGURE 12.3. Selecting Referential Integrity in the Database Desktop.

Double-click RefCustInOrders to bring up the Referential Integrity dialog shown in Figure 12.4.

FIGURE 12.4. The CustNo field in the Orders table relates to the CustNo field in the Customer table.

The fields in the left side of this dialog belong to the Orders table. On the right is a list of all the tables in the database. In the center, you can see that the CustNo field has been selected from the Orders table and the CustNo field has been selected from the Customer table. The primary key of the Customer table is related to the foreign key of the Orders table.

Now go back to the Database Explorer and open up the Indices branch of the Orders table, as shown in Figure 12.5.

Note that you can see the names of the indexes, here labeled as <primary> and as CustNo. The fields found in the indexes are also displayed. For instance, you can see that the primary index consists of the OrderNo field and the secondary index consists of the CustNo field.

FIGURE 12.5. The primary and CustNo indexes on the Orders table.

I am showing these to you so that you will begin to see the distinction between keys and indexes. The two concepts are distinct. For further proof of this, open up the IBLOCAL database in the Database Explorer. Use SYSDBA as the user name, and masterkey as the password. Now open up the Employee project table as shown in Figure 12.6. Note that there are separate listings for the index, primary key, and foreign keys.

FIGURE 12.6. The Employee_Project table has three indexes, one primary key, and two foreign keys.

In practice, almost all keyed fields will also have indexes. This leads people to think the two concepts are the same. However, indexes are about searching and sorting, and keys are about referential integrity. These distinctions will become blurred at times, but it helps if you can keep it in your mind that they are different ideas. The actual details concerning these distinctions will become clear in the next few pages.

You can also see the indexes for a table inside the Database Desktop. To get started, open up the Orders table and select Table | Info Structure from the menu. The fields with the stars beside them are part of the primary index. Drop down the Table Properties combo box to view the secondary indexes. Double-click the indexes you see to view the details of their design. If you want to change the structure of a table, choose Table | Restructure from the menu, rather than Table | Info Structure.

Most of the time, I find the Database Desktop is the right tool to use when I want to create or modify a table, and the Database Explorer is the right tool to use when I want to view the structure of a table. However, I often find myself jumping back and forth between the two tools, to get the best features of each. Later in the book I will talk about case tools, which are generally superior to either of the products discussed in this section. However, there are no case tools that ship with BCB, so I emphasize the universally available tools in this text.

Throughout the ensuing discussion, you might have occasion to use the Database Explorer to examine the structure of the Customer, Orders, Items, and Parts tables. These are the tables I use when defining what relational databases are all about.
Rule Numero Uno: Create a Primary Key for Each Table!

The last two sections have introduced you to some of the key concepts in relational databases. If there is one lesson to take out of this chapter, it is the importance of creating a unique numerical key in the first field of most tables you create. This field is called a primary key. In both Paradox and InterBase, it is impossible to create a primary key without also simultaneously creating an index.

If you want to have a list of addresses in a table, don't just list the Address, City, State, and Zip. Be sure to also include an integer-based CustNo, AddressNo, or Code field. This field will usually be both an index and the first field of the database. It is the primary key for your table, and must be, by definition, unique. That is, each record should have a unique Code field associated with it.

The primary key

Serves as the means of differentiating one record from another

Is also in referential integrity

Can also help with fast searches and sorts

As I said earlier, the distinction between indexes and keys becomes blurred at times. However, they are distinct concepts and you should endeavor to discover the differences.

NOTE: In this discussion I am taking a liberty in saying that you have to create a primary key for a table in a relational database. In fact, you can simply create a field that contains a unique integer value. It doesn't have to be an index. However, making a unique index for the field will speed up the operation of your database, and it will help enforce rules that make it easy to create robust relational databases. In particular, the restraints on a primary key make it impossible for you to create two fields in one table with the same primary key.

Just to make sure this is clear, I'll go ahead and list out the right and wrong way to create a table.

Right Method
CustNo: Integer
LastName, FirstName, Address, City, State, Zip: string

Wrong Method

LastName, FirstName, Address, City, State, Zip: string

The first example is "correct" because it has a primary index called CustNo. It is declared as a unique Integer value. The second example is "wrong" because it omits a simple numerical field as the primary index.

NOTE: I put the words "correct" and "wrong" in quotes because there really are no hard-and-fast rules in this discipline. There are occasions when you might not want to create a table that has a simple integer as a primary index. However, ninety-nine percent of the time, that's exactly what you want to do.

At the height of a warm May spring day, there is such a thing as a rose bush that has no buds or flowers. However, the whole point of rose bushes in May is that they flower. I doubt we would feel quite the same way about roses if they did not have beautiful blooms. In the same way, relational databases without primary indexes wouldn't garner quite so much attention as they do now.

I should add that not all primary indexes are numeric fields. For instance, many tables might use alpha fields containing values such as HDA1320WW35180. I'm stressing simple numeric fields in this chapter because they are easy to work with and easy to understand.

Even if you don't yet understand how databases work, for now I would suggest automatically adding a simple numerical value in a primary index to all your tables. Do so even if you are not using the field at this time. Believe me, as you come to understand relational databases, you will see why I recommend doing this in most, though not all, cases. At this point, however, you will probably be better off creating the extra field and letting it go to waste, even if you don't understand why you are doing it. After you get a better feeling for relational databases, you will understand intuitively when the field is needed, and when you are encountering one of those rare occasions when it is going to be useless.

When people first work with relational databases, they can get a little hung up about the overhead involved in creating all these extra key fields. The point to remember is that these fields allow the database to be treated as nothing more than sets of simple integers related together in various combinations. Computers fly through integer math. Adding these extra index fields to your tables makes your data become computer-friendly. Computers love those simple integer fields; your computer will show its thanks by running faster if you add them to your tables!

Computers don't feel weighed down by the extra field any more than a car feels weighed down by a steering wheel, people feel weighed down by their hands, or a rose bush feels weighed down by a rose. Relational databases want you to add an extra integer field as a primary index to your tables!

Remember, people like beautiful paintings, eloquent words, and lovely members of the opposite sex. Computers like logic. They like numbers, they like nice, clean, easily defined relationships! They like simple, integer-based primary keys in the first field of a table!
One-to-Many Relationships: The Data and the Index

One good way to start to understand relational databases is by working with the Customer, Orders, Items, and Parts tables from the BCDEMOS database. All four of these tables are related in one-to-many relationships, each-to-each. That is, the Customer table is related to the Orders table, the Orders table to the Items table, and the Items table to the Parts table. (The relationship also works in the opposite direction, but it may be simpler at first to think of it as going in only one direction.)

Master Detail Connector (primary key and foreign key)

Customer Orders CustNo

Orders Items OrderNo

Items Parts PartNo

Read the preceding table as a series of rows, starting left and moving to the right, as if they were sentences. The preceding list shows that the Customer and Orders tables are related in a one-to-many relationship, with Customer being the master table and Orders being the detail table. The connector between them is the CustNo field. That is, they both have a CustNo field.

The CustNo field is the primary key of the Customer table and the foreign key of the Orders table. The OrderNo field is the primary key of the Orders table and a foreign key of the Items table. The PartNo field is the primary key of the Parts table and a foreign key of the Items table.

The relationship between these tables can be reversed. For instance, the Parts table could become the master table and the Items table the detail table, and so on, back down the line. The reason you can reverse the relationship becomes clear when you think in purely mathematical terms. The Customer table has a series of CustNo fields. Say the CustNo for the first record is 1000. To get the Orders associated with that customer, you ask this question: "What are all the rows from the Orders table that have a CustNo of 1000?" That is:

Select * from Orders where CustNo = 1000

Clearly, you could reverse this question. If you select a particular row from the Orders table, you could find which item from the Customer table it is related to by asking for the set of all Customer records with a CustNo of 1000. Because the CustNo field for the Customer table is a unique index, you will get only one record back. However, the way you relate the tables is still the same:

Select * from Customer where CustNo = 1000

Working with Primary Keys

The Parts, Orders, Items, and Customer tables have various keys. As it happens, these keys are also indexes. An index enables you to sort tables on a particular field. A key helps you define the relationship between two tables, or otherwise group related bits of information by a set of predefined and automatically enforced rules.

Unfortunately, sadly, and confusingly, you can still relate tables even without the presence of any keys or indexes. For instance, if there were no CustNo primary and foreign keys in the Customer and Orders tables, Paradox would still let you use SQL to relate the tables in a one-to-many relationship. However, in this scenario, performance would be slow because there is no index, and there would be no constraints on the data you could enter in the two tables because there would be no primary and foreign keys that define referential integrity. In this scenario you are back to the rosebush-without-a-rose phenomena. Yes, the tables are still part of a relational database, but they lack the features that make a relational database appealing. You need both the keys and the indexes to make a relational database appealing.

I'll draw a distinction between only two different kinds of keys. The first kind I will discuss is called a primary key. The second is called a foreign key.

A primary key is a unique value used to identify a record in a table. It is usually numerical, and it is usually indexed. It can be combined with a foreign key to define referential integrity. I will talk more about referential integrity later in this chapter.

Because it is indexed, the primary key defines the default sort order for the table. When you first open up a table, it will be automatically sorted on this field. If a table does not have a primary index, records will appear in the order in which they were added to the table. For all practical purposes, a table without an index has no defined order in which records will appear.

With Paradox tables, each entry in the primary index must be unique. That is, you can't have two CustNos in the Customer table that are the same. You can, however, have multiple foreign keys that are not unique.

It is legal to have multiple fields in the primary index of a Paradox table. This is called a composite index. These fields must be sequential, starting with the first field in the table. You can't have the primary index consist of the first, third, and fourth fields of a table. A composite index with three fields must consist of the first, second, and third fields. If you have a FirstName and a LastName field in your database, they can both be part of the primary index. You should, however, declare the LastName before the FirstName, so that your index will list people alphabetically by last name: CustNo, LastName, FirstName.

The primary and foreign keys are never composite. They always consist of one field.

Creating a primary key enables you to have two people with the same name, but with different addresses. For instance, you can list a John Doe on Maple Street who has a CustNo of 25, and a John Doe on Henry Street who has a CustNo of 2000. The names may be the same, but the database can distinguish them by their CustNo. Once again, this shows why databases love those simple integer indexes. If the database had to sort on the address fields every time it tried to distinguish these two John Does, it would take a long time for the sort to finish.

Computers can easily distinguish the number 25 from the number 2000, but it takes them longer to do a string compare on "Maple Street" and "Henry Street". Furthermore, just comparing the streets wouldn't be enough; you would also have to compare cities, states, and so on. If two entries with the same name were both missing addresses, the whole system would be in danger of falling apart altogether. The same thing would happen if two people named John Doe lived at the same address. Use those integer indexes; they make your life simpler!
Working with Secondary Indices and Foreign Keys

It's now time to move on to a consideration of foreign keys. The CustNo field of the Orders table is a foreign key because it relates the Orders table to the primary key of the Customer table. It is also a secondary index which aids in sorting and searching through data. Indices also speed up operations such as joins and other master-detail relationships.

When writing this section, I have found it difficult to totally divorce the idea of foreign key and secondary indexes. However, I will try to split them up into two categories, taking foreign keys first:

A foreign key provides a means for relating two tables according to a set of predefined rules called referential integrity.

In Paradox, you use the Referential Integrity tools from the Database Desktop to define foreign keys. There is no such thing as a composite foreign key.

Using SQL you can relate two tables in a one-to-many relationship even if there is no index or key in either table. However, your performance will be better if you have indexes. There will be no way to enforce referential integrity if you don't define foreign and primary keys.

Using the TTable object, it is impossible to relate two tables in a one-to-many relationship without indexes. (This is one of the points that doesn't clearly belong in either the section on keys, or the one on indexes. It relates to both subjects.)

Here are some facts about secondary indexes:

A secondary index provides an alternative sort order to the one provided by the primary key.

You need to explicitly change the index if you want to switch away from the primary index to a secondary index. Remember that the default sort order for a Paradox table is provided by the primary index. If you want to switch from the primary index to a secondary index, you need to change the IndexName or IndexFieldName property of your table. If you want to use the primary index, you don't have to do anything; the table will sort on that field automatically.

An index that contains more than one field is called a composite index. You can create composite secondary indexes, which means the indexes will contain multiple fields. In practice, fields such as FirstName and LastName can often be part of a secondary index, because your primary index is usually a unique numerical value. Sometimes a primary index will consist of three fields, such as the CustNo, FirstName, and LastName fields.

In Paradox tables all primary and foreign keys must be indexed. You can't define referential integrity without indexes, and in particular, you must have a primary key. Furthermore, in InterBase tables, the act of defining a primary or foreign key will automatically generate an index. (Once again, this is an item that doesn't clearly belong in either the discussion of keys or of indexes, but rather it relates to both. As I said earlier, there are times when the distinction between the two subjects becomes blurred.)

If you are new to databases, you will undoubtedly be frustrated to discover that different databases have varying rules for setting up indexes, keys, and so on. In this book, I tend to use Paradox tables as the default, but I also spend considerable time describing InterBase tables. If you use some other database, such as dBASE, Oracle, or Sybase, you should be sure to read up on the basic rules for using those tools. For instance, some databases let you set up a foreign key that is not an index. In the Paradox and InterBase world, however, foreign keys are always accompanied by an index, so the two words become synonymous, particularly in the hands of people who don't really understand how relational databases work.

The good news is that you will find that overall there are certain basic principles that define how databases work. The details may vary from implementation to implementation, but the fundamental ideas stay the same.
Keys Are the Keys to the Kingdom!

Let me take this whole paradigm even one step further. When I first looked at a database, I thought of it as a place to store information. After spending a lot of time with relational databases, I now think of them primarily as a way to relate bits of information through keys and indexes.

I know this is putting the cart before the horse, but what really interests me about databases now is not the fact that they contain information per se, but that I can query them to retrieve related bits of information. In other words, I'm more interested in the logic that defines how tables relate to one another than I am in the information itself.

No one can get excited about a list of addresses or a list of books. The lists themselves are very boring. What's interesting is the system of keys and indexes that relate tables together, and the various SQL statements you can use to ask questions against various sets of tables.

When I picture a table, I see its primary and foreign keys as great big pillars, and I envision all the rest of the data as a little stone altar that is dwarfed by the pillars. Like a pagan temple, it's the pillars that you notice first; the altar is just a small stone structure you might overlook until someone points it out. Of course the temple is built around the altar, and databases are built around their data. But in practice it is easy to overlook the data. You care about the pillars, and you care about the primary and foreign keys. The rest tends to fade into the background.

Give me a well-designed database with lots of interrelated tables and I can have fun asking it all sorts of interesting questions. It's not the data per se that is important, but the way the data is related!

The act of properly relating a set of tables in a database is called, tragically enough, "normalizing" the data. Where this dreadful term came from I have no idea, but "normalizing" a database is the fun part of creating a database application.
Exploring the Keys and Indices in the BCDEMOS Database

I am now going to look again at the tools that ship with BCB, and show how to use them to view and create indexes and keys. This examination of the subject will have greater depth than the quick overview presented earlier in this chapter.

Here is a list of the indexes on the Customers, Orders, Items, and Parts tables:

Table name Primary indexes Secondary indexes

Customer CustNo Company

Orders OrderNo CustNo

Items OrderNo, ItemNo OrderNo, PartNo

Parts PartNo VendorNo, Description

Notice that the Items table has a composite primary index consisting of the OrderNo and ItemNo fields. It also has two secondary indexes, one on the OrderNo field and one on the PartNo field. The Parts table has two secondary indexes, one on the VenderNo, and one on the Description field.

If you do not have a pre-made list like this one, you could find this information in at least four ways:

The Object Inspector

The Database Explorer

The Database Desktop

By creating a program that leverages the methods of the TSession object. Such a program will be shown in Chapter 14.

I will explain all these methods and then discuss some possible alternative techniques.

If you drag the Customer table off the Explorer and onto a form, you will be able to view its Indices in the Object Inspector. If you drop down the IndexName property editor, you will see that there is one index listed there. This is the secondary index, called ByCompany. If you select this index, the table will sort on the Company field.

If you set the IndexName property back to blank, the table will sort automatically on the primary index, which is the CustNo field. In other words, BCB never explicitly lists the primary index in the IndexName property editor. I suppose that the architects of the VCL assumed that all tables have a primary index, and that if you don't specify a particular index name, you want to sort on that index. Of course, it is not an error to create a table that has no primary index, and BCB can still work with that kind of table.

You can also drop down the IndexFieldNames property, which gives you a list of the fields that are indexed, in this case the CustNo and Company fields. Here you can see the fields included in the primary index, but they are not marked as belonging to any particular index.

NOTE: To study an interesting case, drop down the Items table on a form. Recall that it has a primary index on the OrderNo and ItemNo fields, and secondary indexes on the OrderNo and PartNo fields. If you drop down the index field names, you see the following list:

OrderNo OrderNo; ItemNo PartNo

The first item is the ByOrderNo index--the second the primary index--and the third, the PartNo index.

The IndexName and IndexFieldNames properties give you a handy way of tracking Indices at design time. They don't, however, give you all the information you might need, such as exactly what fields make up which parts of the primary and secondary Indices. In this case, you could probably guess, but it would still be nice to get a more definitive answer.

If you open up the Database Explorer, expand the BCDEMOS node, the Tables node, the Customer node, and finally the Indices node, you get (naturally enough) a list of the Indices on the Customer table! This is a great feature, and you should use it whenever possible. Figure 12.7 shows the expanded nodes of the Indices for the Customer table. (The program kdAddExplore in the Chap14 subdirectory on the CD-ROM that accompanies this book uses the TSession object to do the same thing in a BCB program.)

While you have the Explorer open, you should also expand the Fields node, as shown in Figure 12.8. This gives a quick list of all the fields and their types. Notice that you can drag and drop individual fields onto a form.

A third way to get a look at the structure of a table is through the Database Desktop (DBD). You can open this program from the Tools menu in C++Builder. Use the File menu in the DBD to set the Working Directory to the BCDEMOS Alias. Open up the Customer table and choose the Table | Info Structure menu choice. Drop down the Table Properties combo box and look up the secondary Indices, as shown in Figure 12.9. The primary index is designated by the asterisks after the keyed fields in the Key Roster. In this case, only the CustNo field is starred, because it is the sole keyed field.

FIGURE 12.7. The Indices of the Customer table viewed in the Database Explorer.

FIGURE 12.8. The Fields view of the Customer table from the Database Explorer.

FIGURE 12.9. The Database Desktop struts its venerable features by displaying the Indices on the Customer table.

NOTE: Over time, the Database Desktop will probably be replaced entirely by the Explorer. However, there are still some things that the DBD does better than the Explorer, so both products are shipped with C++Builder.

Notice the Save As button on the Info Structure dialog. You can use this to save a table that contains the structure of the Customer table. You can then print this out on a printer using TQuickReports. Be sure to use a fixed-size font, not a proportional font:

Field Name Type Size Key CustNo N * Company A 30 Addr1 A 30 Addr2 A 30 City A 15 State A 20 Zip A 10 Country A 20 Phone A 15 FAX A 15 TaxRate N Contact A 20 LastInvoiceDate @

In the example shown here, I have printed out only the first four fields of the table because of space considerations. (The fields are Field Name, Type, Size, and Key.) If I then recursively print out the structure of the table used to house the structure of the Customer table, I get the following report:

Field Name Type Size Key Field Name A 25 Type A 1 Size S Key A 1 _Invariant Field ID S _Required Value A 1 _Min Value A 255 _Max Value A 255 _Default Value A 255 _Picture Value A 176 _Table Lookup A 255 _Table Lookup Type A 1

This is the same information found in the Data Dictionary, and it should prove sufficient under most circumstances.
Using the Database Desktop to Create Indexes

To create a unique primary key in a Paradox table, open up the Database Desktop, and create a table with the first field declared as an Integer or autoincrement value. Place a star next to the first field, which tells Paradox to create a primary index on it, as shown in Figure 12.10.

FIGURE 12.10. Place asterisks next to the first field or fields of a table to designate the primary index.

To create a secondary index, drop down the table properties list and choose Secondary Indices. (See Figure 12.11.) Click the Define button. Select the fields from your table that you want to be part of your index. Click OK. A simple dialog will then pop up asking you to name the index. I usually give the index a name based on the fields being indexed. For instance, if I want to create an index on the CustNo field, I would call the index CustNo, CustNoIndex, or ByCustNo. If I wanted to create one on a field called Name, I would call the index Name, NameIndex, or ByName.

FIGURE 12.11. Creating a secondary index in a Paradox table.

Using the Database Desktop to Create Primary and Foreign Keys

To create a primary or foreign key on a Paradox table you need to define something called referential integrity. You cannot define referential integrity without first defining primary keys on both tables involved. There also must be an index on the foreign key, but this index will be created automatically for you when you create the foreign key.

In InterBase, the situation is somewhat different. The act of creating primary or foreign keys will automatically define indexes. As I said earlier, there are little variations on the main themes of relational databases, depending on what kind of database you use.

In the Data subdirectory from the CD that ships with this book you will find two tables called MasterTable and DetailTable. Figure 12.10 shows how to use the Database desktop to create the MasterTable. These tables look like this, with the MasterTable listed first and the DetailTable listed second:

Field name Type Size Primary index?

Code + *

Name A 25

Field name Type Size Primary index?

Code + *

MasterCode I

SubName A 25

To create referential integrity between these two tables, you should open up the DetailTable in the Database Desktop. Open the Table | Restructure menu item. Select Referential Integrity from the Table Properties combo box. Click the Define button, and set things up so they look like they do in Figure 12.12. Click the OK button and give this relationship a name, such as RefMasterDetail.

FIGURE 12.12. Defining referential integrity between the DetailTable and MasterTable.

When you are done, you will have created primary keys and foreign keys on the MasterTable and DetailTable. The best way to see these keys is in the Database Explorer. On my system I used the BDE Configuration Utility to create an alias called CUnleashed that points at the Data subdirectory. If you open this alias in the Database Explorer and go to MasterTable, you can see the primary and foreign keys, which Paradox calls Primary and Foreign Fields.
Why Use Referential Integrity?

Referential integrity is one of the most valuable tools in a database programmer's arsenal. In particular, referential integrity will help guide the user so that they do not accidentally enter invalid data, or accidentally delete needed records.

To see referential integrity in action, use the Database Desktop to enter two records in the MasterTable. The first should have the word Day in the Name field and the second should have the word Month in the Name field. You do not have to fill in the Code field, because it is an autoincrement field (+) and will be updated automatically.

Code Name

1 Days

2 Months

In the DetailTable, enter in a few names of days of the week or months of the year in the SubName field. Give the MasterCode field a 1 if you are entering a day, and 2 if you are entering a month.

Code MasterCode SubName

1 1 Monday

2 1 Tuesday

3 2 January

4 2 February

5 2 March

With this data in the tables, you could define a one-to-many relationship such that if you viewed the MasterTable record with Days in the Name field you would see only the days in the DetailTable, and if you selected Months, you would see only the month names from the DetailRecord.

Referential integrity will do two things to help make sure that these tables stay in good shape. It will prevent you from deleting a record in the MasterTable that has detail records associated with it in the DetailTable. For instance, if you select the MasterTable, set the Database Desktop in Edit mode and press Control+Delete, you will not be able to delete a record from the MasterTable. Referential integrity will prevent you from entering a value in the MasterCode field of the DetailTable that is not in the primary key of the MasterTable. For instance, if you tried to enter the number 3 in the DetailTable's MasterCode field, you would get the error message "Master field missing". This is because there is no record in the MasterTable with a Code field of 3. Of course, if you added a record to the MasterTable with a Code field that had 3 in it, the database would let you enter the data.

Needless to say, these rules are also enforced inside BCB. In your own programs, you might want to create exception handlers that would pop up messages that explained to the user exactly what was wrong, and why they could not perform a particular operation. Most users would not respond well to an exception that said no more than "Master field missing!"

That is the end of my explanation of relational databases. In the last few pages you have learned about primary keys, foreign keys, indexes, referential integrity, and how all these pieces fit together to help you create robust applications. In the next few pages I will step you through some simple examples that illustrate these points.
One-to-Many Relationships: The Code

Now that you know something about the data in the Customer, Orders, Items, and Parts tables, it's time to link them together in a single program called Relate. To get started, begin a new project and add a data module to it. Place four TTable objects and four TDataSource objects on the data module, wire each data source to a TTable object, and then wire each of the TTable objects to one of the four tables mentioned earlier. You can also rename the TTable and TDataSource objects so that they correspond with their respective tables, as shown in Figure 12.13.

FIGURE 12.13. The data module for the Relate project.

Drop four TDBGrid objects on the main form for the project. Use the File | Include Unit Header menu option to link Form1 to DataModule1. Wire the grids to the datasources on the datamodule, making sure that each grid has its DataSource property assigned to a unique object. For instance, link the first grid to the Customer table, the second to the Orders table, and so on.

Using the names visible in Figure 12.4, click the OrdersTable component and set its MasterSource property equal to CustomerSource, that is, set its MasterSource equal to the TDataSource object that is linked to the TTable object that hosts the Customer table. Set the ItemsTable MasterSource property equal to OrdersSource and the PartsTable MasterSource equal to ItemsSource.

Click the OrdersTable MasterFields property and link up the Orders and Items tables on the CustNo field, as described in Chapter 9, "Using TTable and TDataSet." In the same way, hook up the TblItems to OrdersTable ká[infinity]the OrderNo field, and PartsTable to ItemsTable on the PartNo field. If you set all the tables to active and then run the program, the result should look like what you see in Figure 12.14.

Spend a little time mucking about with this program. Notice, for instance, that if you change the selected item in the Customer table, the contents of the grids showing the Orders, Items, and Parts tables will change. In particular, notice that the CustNo in all the items in the Orders table is always equal to the CustNo in the currently selected item in the Customer table. The same thing can be said about the OrderNo field in the Orders and Items tables, and the PartNo field in the Items and Parts tables.

In general, selecting one item at any level but the lowest in the hierarchy will force many detail records to change. That is why these are called one-to-many relationships. One record in the Orders table points to many records in the Items and Parts tables.

FIGURE 12.14. The Relate program at runtime.

NOTE: In this particular example, you might notice that the Parts table is always arranged in a one-to-one relationship with the Items table. However, if you reverse the order of these tables and make the Parts table the master, the arrangement will look more like a proper one-to-many relationship. However, it is not wrong to make either table the master. The point is simply to arrange the tables so that you get the information from them that you want to obtain.

This discussion of the Relate program has given you a look at some of the important features in the Database Explorer and Database Desktop. It has also given you a quick run-down on some of the key ideas behind the construction of relational databases. The point here is that C++Builder has lots of built-in tools that help you construct relational databases. There is more that I want to say about these topics, even in this rather sketchy overview of a complicated subject. In particular, I have not yet talked about joins.
Relational Databases and Joins

In the last section, you saw how to relate the Customers, Orders, Items, and Parts tables in a one-to-many relationship that is sometimes called a master-detail relationship. In this section, you will again relate all four tables, but in a different kind of relationship, called a join.

You had a look at joins in the last chapter, "Working with Field Objects." This time the query that you need to build is a bit longer:

SELECT DISTINCT d.Company, d1.AmountPaid, d2.Qty, d3.Description, d3.Cost, d3.ListPrice FROM "Customer.db" d, "Orders.db" d1, "Items.db" d2, "Parts.db" d3 WHERE (d1.CustNo = d.CustNo) AND (d2.OrderNo = d1.OrderNo) AND (d3.PartNo = d2.PartNo) ORDER BY d.Company, d1.AmountPaid, d2.Qty, d3.Description, d3.Cost, d3.ListPrice

Though not horrendously complicated, the syntax shown here is still ugly enough to give some people pause.

The basic principles involved in this kind of statement are simple enough to describe. All that's happening is that the Customer, Orders, Items, and Parts tables are being joined together into one large table of the type you would have to create if you were trying to track all this information in a single flat-file database. The one proviso, of course, is that not all the fields from the four tables are being used. In fact, the only ones mentioned are

d.Company, d1.AmountPaid, d2.Qty, d3.Description, d3.Cost, d3.ListPrice

Here the d, d1, d2, and d3 are described in the following From clause:

"Customer.db" d, "Orders.db" d1, "Items.db" d2, "Parts.db" d3

The Order By clause, of course, simply defines the sort order to be used on the table created by this join. I am guilty here of using meaningless variable names. In general, you should choose identifiers more informative than d1 or d2.

You can create a program that performs this join by dropping a TQuery, TDataSource, and TDBGrid on a form. Wire the objects together, wire the TQuery to the BCDEMOS database, and set its SQL property to the query shown previously. A sample program called FourWayJoin demonstrates this process. The output from the program is shown in Figure 12.15.

If you are not familiar with this kind of join, you might want to bring up the Relate and FourWayJoin tables side by side and compare them. Look, for instance, at the Action Club entries in the FourWayJoin program and trace them through so that you see how they correspond to the entries in the Relate program. Both programs describe an identical set of relationships; they just show the outcome in a different manner.

Notice that the AmountPaid column in the FourWayJoin program has the same number repeated twice in the Action Club section, as shown in Figure 12.15. In particular, the numbers $1,004.80 and $20,108 both appear twice. This is because there are two different items associated with these orders, as you can tell from glancing at the Parts table in the Relate program.

FIGURE 12.15. The FourWayJoin program demonstrates a join between four tables.

NOTE: Unless you are already familiar with this material, be sure to run the FourWayJoin and Relate programs and switch back and forth between them until you understand why the FourWayJoin program works as it does. I find it easy to understand the Relate program at a glance, but the FourWayJoin program is a bit more subtle.

Joins and QBE

The FourWayJoin program is a good advertisement for the power of SQL. Once you had the SQL statement composed, it was simple to put the program together. All the work is embodied in just a few lines of code, and everything else was trivial to construct. SQL can help concentrate the intelligence of a program in one small area--or at least it does in this one example.

The sticking point, of course, is that not everyone is a whiz at composing SQL statements. Even if you understand SQL thoroughly, it can still be confusing to try to string together all those interrelated Select, Order By, From, and Where clauses. What is needed here is a way to automate this process.

Most of the versions of C++Builder ship with a very useful tool that makes it easy to compose even relatively complex SQL statements. In particular, I'm talking about the QBE tool in the Database Desktop. If you want, you can use the Query Builder instead, or some other third-party tool that you might favor. However, in this section of the book, I will concentrate on the QBE tool, because it will be available to nearly all readers of this book. (QBE is also built into Paradox. Furthermore, there are some third-party QBE components on the market. The Query Builder only ships with the client/server version of C++Builder or Delphi.)

Start the DBD and set the Working Directory to the BCDEMOS alias. Choose File | New | QBE Query from the menu. A dialog will appear listing the tables in the BCDEMOS database. Select the Customer table. Reopen the Select File dialog by clicking the Add Table icon in the Toolbar. You can find the Add Table icon by holding the mouse over each icon until the fly-by help comes up or until you see the hint on the status bar. You can also simply look for the icon with the plus sign on it. Continue until you have added the Customer, Orders, Items, and Parts tables to the query. You can multiselect from inside the FileOpenDialog. Resize the query window until all four tables are visible, as shown in Figure 12.16.

FIGURE 12.16. Four tables used in a single QBE example.

To join these tables together, select the Join Tables icon, located just to the right of the lightning bolt. Click once on the Join Tables icon, and then click the CustNo fields for the Customer and Orders tables. The symbol "join1" will appear in each field. Click the Join Tables icon again, and link the Orders and Items tables on the OrderNo field. Join the Parts and Items tables on the PartNo field.

After joining the tables, select the fields you want to show by clicking once in the check box associated with the fields you want to view. When you are done, the result should look like Figure 12.17.

FIGURE 12.17. The complete QBE query for joining the Customer, Orders, Items, and Parts tables.

To test your work, click the lightning bolt icon once. You should get a table that looks just like the one in the FourWayJoin program. You will find a copy of this QBE query in the Chap12 directory on the CD-ROM that accompanies this book.

To translate the QBE statement into SQL, first close the result table so you can view the query shown in Figure 12.17. Click once on the SQL icon to perform the translation. You can save this SQL to disk, or just block-copy it and deposit it in the SQL property of a TQuery object.

On paper, this process takes a few minutes to explain. However, once you understand the QBE tool, you can use it to relate multiple tables in just a very few seconds. For most people, QBE is probably the simplest and fastest way to compose all your SQL Select statements. Don't neglect learning to use this tool. It's a simple, easy-to-use tool that can save you hours of time.

NOTE: The only peculiarity of the QBE tool is that by default it saves its output in a text-based language called QBE, rather than in SQL. However, once you press the SQL button, it converts the QBE code to SQL, thereby rendering the exact same results produced by standard SQL query builders. Once again, the great advantage of the QBE tool over other SQL tools is that it ships with the DBD product that accompanies nearly all versions of C++Builder. If you have access to a more powerful SQL builder, you might want to use it instead of the QBE tool. However, QBE works fine in most circumstances, even when running against SQL data in an InterBase table.

What I have said here, is, of course, heresy to many members of the hard-core client server crowd. They tend to have a natural aversion to QBE, just as C++ and Object Pascal programmers shy away from BASIC. However, QBE ships for free with both Paradox and the Database Desktop, and it will meet the needs of ninety percent, but not all, of the programmers out there. So it's worth a look, yes?

That's it for the discussion of the basic principles of relational databases. You've seen how to build master-detail relationships, and how to construct joins. More importantly, you've seen how C++Builder encapsulates these key aspects of relational database design. There is, of course, much more to the theory of relational databases. There are whole books on this subject, particularly on the best way to design relational databases.
Which Database Should I Use?

If you are not sure of which database to use, I would tentatively suggest using Paradox to get started. It has a robust set of rules for enforcing data integrity, a rich set of types, and some nice features such as autoincrement fields. It works fine on a network, as long as everyone can attach their PCs to one centralized server and you aren't expecting a large number of simultaneous users.

If you are expecting 30 or more simultaneous users, I would bite the bullet financially and switch to InterBase or to another standard SQL server such as Oracle, Sybase, or MS SQL Server. You could have a hundred or even two hundred users hitting a Paradox table at the same time, but I wouldn't recommend it. If you have a hundred users, but only ten or fifteen are likely to be after a table at one time, I would still feel comfortable with Paradox, though I would start leaning in the direction of a real client/server database.

Client/server databases such as InterBase will

Let you talk to the server over TCP/IP, or some other network protocol

Put much of the computational burden on the server rather than the client machine

Prove to be much more robust

Enable you to use stored procedures, triggers, views, and other advanced server-side technologies

Remember that when I make suggestions about databases or about anything else, I am usually not so much trying to establish a definitive standard as I am trying to give reasonable advice to those readers who are not sure which way to turn.
Summary

My suggestion at this point is to dig into relational databases and learn as much about them as you can. Raw data sitting on a disk is boring. Rows of data in a grid are boring. Relational databases, however, are innately interesting. This is the fun part of database programming. Play around with indexes, or play around with joins and one-to-many relationships. The name of the game here is to find ways to arrange data in relational tables so that you can get at it easily. When you arrange data correctly, it's amazing to see how quickly you can locate very obscure pieces of information. In fact, a number of very fun games, such as Civilization or the Ultima series, rely heavily on databases in order to further the game play. Take some time to dig into this stuff. It's more interesting than you might think.

If you are wishing that I had spent more time on InterBase tables, don't worry, because I cover that topic heavily later in the book. Much of the material covered in this chapter will be reviewed again, in much shorter form, in the light of the InterBase server.

©Copyright, Macmillan Computer Publishing. All rights reserved.